How to create and run a gap-filled FBA from PATRIC

The PATRIC (the Pathosystems Resource Integration Center) contains the best collection of well annotated genomes. They also happen to have been annotated by RAST, and so we should be able to use those integrations directly.

Here we'll walk through taking a genome from PATRIC, building a model, and running it. PATRIC also has model reconstruction built in, but when I tried it (05/24/16) it was not working.

As usual, we'll start by loading some modules that we'll need for our analysis.


In [1]:
import sys
import os
import copy
import PyFBA

Find a genome and download the annotations

You need to find your genome in PATRIC and download the annotations.

Once you have identified the genome you would like to build the model for, choose Feature Table from the menu bar:

Next, choose Download and save as a text file (.txt).

That will save a file called FeatureTable.txt to your Downloads location. That file has the following columns:

| Genome | Genome ID | Accession | PATRIC ID | RefSeq Locus Tag | Alt Locus Tag | Feature ID | | Annotation | Feature Type | Start | End | Length | Strand | FIGfam ID | | PATRIC genus-specific families (PLfams) | PATRIC cross-genus families (PGfams) | Protein ID | AA Length | Gene Symbol | Product |

The key columns are PATRIC ID (Column 3) and Product (Column 19) [Column numbers are 0 based!]

Now that we know that, we need to convert these feature names into functional roles. The key here is to split on adjoiners, such as ' / ', ' # ', and ' @ '.


In [46]:
assigned_functions = {}
with open(os.path.join(os.environ['HOME'], 'Downloads/FeatureTable.txt'), 'r') as f:
    for l in f:
        p=l.strip().split("\t")
        assigned_functions[p[3]]=PyFBA.parse.roles_of_function(p[19])
roles = set([i[0] for i in [list(j) for j in assigned_functions.values()]])
print("There are {} unique roles in this genome".format(len(roles)))


There are 3649 unique roles in this genome

Next, we convert those roles to reactions. We start with a dict of roles and reactions, but we only need a list of unique reactions, so we convert the keys to a set.


In [47]:
roles_to_reactions = PyFBA.filters.roles_to_reactions(roles)
reactions_to_run = set()
for role in roles_to_reactions:
    reactions_to_run.update(roles_to_reactions[role])
print("There are {}".format(len(reactions_to_run)) +
      " unique reactions associated with this genome".format(len(reactions_to_run)))


There are 1337 unique reactions associated with this genome

In [51]:
reactions_to_run.add('rxn13681')
print("There are {}".format(len(reactions_to_run)) +
      " unique reactions associated with this genome".format(len(reactions_to_run)))


There are 1338 unique reactions associated with this genome

Read all the reactions and compounds in our database

We read all the reactions, compounds, and enzymes in the ModelSEEDDatabase into three data structures. Each one is a dictionary with a string representation of the object as the key and the PyFBA object as the value.

We modify the reactions specifically for Gram negative models (there are also options for Gram positive models, Mycobacterial models, general microbial models, and plant models).


In [26]:
compounds, reactions, enzymes = \
    PyFBA.parse.model_seed.compounds_reactions_enzymes('gramnegative')

Update reactions to run, making sure that all reactions are in the list!

There are some reactions that come from functional roles that do not appear in the reactions list. We're working on tracking these down, but for now we just check that all reaction IDs in reactions_to_run are in reactions, too.


In [52]:
tempset = set()
for r in reactions_to_run:
    if r in reactions:
        tempset.add(r)
    else:
        sys.stderr.write("Reaction ID {} is not in our reactions list. Skipped\n".format(r))
reactions_to_run = tempset

Test whether these reactions grow on ArgonneLB media

We can test whether this set of reactions grows on ArgonneLB media. The media is the same one we used above, and you can download the ArgonneLB.txt and text file and put it in the same directory as this iPython notebook to run it.

(Note: we don't need to convert the media components, because the media and compounds come from the same source.)


In [6]:
media = PyFBA.parse.read_media_file('ArgonneLB.txt')
print("Our media has {} components".format(len(media)))


Our media has 65 components

Define a biomass equation

The biomass equation is the part that says whether the model will grow! This is a metabolism.reaction.Reaction object.


In [7]:
biomass_equation = PyFBA.metabolism.biomass_equation('gramnegative')

Run the FBA

With the reactions, compounds, reactions_to_run, media, and biomass model, we can test whether the model grows on this media.


In [53]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, reactions_to_run,
                                          media, biomass_equation)
print("Initial run has a biomass flux value of {} --> Growth: {}".format(value, growth))


Initial run has a biomass flux value of 0.0 --> Growth: False

Gap-fill the model

Since the model does not grow on ArgonneLB we need to gap-fill it to ensure growth. There are several ways that we can gap-fill, and we will work through them until we get growth.

As you will see, we update the reactions_to_run list each time, and keep the media and everything else consistent. Then we just need to run the FBA like we have done above and see if we get growth.

We also keep a copy of the original reactions_to_run, and a list with all the reactions that we are adding, so once we are done we can go back and bisect the reactions that are added.


In [54]:
added_reactions = []
original_reactions_to_run = copy.copy(reactions_to_run)

In [49]:
reactions_to_run = copy.copy(original_reactions_to_run)

Media import reactions

We need to make sure that the cell can import everything that is in the media... otherwise it won't be able to grow. Be sure to only do this step if you are certain that the cell can grow on the media you are testing.


In [55]:
media_reactions = PyFBA.gapfill.suggest_from_media(compounds, reactions,
                                                   reactions_to_run, media)
added_reactions.append(("media", media_reactions))
reactions_to_run.update(media_reactions)

In [56]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, reactions_to_run,
                                          media, biomass_equation)
print("Run has a biomass flux value of {} --> Growth: {}".format(value, growth))


Run has a biomass flux value of 0.0 --> Growth: False

Essential reactions

There are ~100 reactions that are in every model we have tested, and we construe these to be essential for all models, so we typically add these next!


In [57]:
essential_reactions = PyFBA.gapfill.suggest_essential_reactions()
added_reactions.append(("essential", essential_reactions))
reactions_to_run.update(essential_reactions)

In [58]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, reactions_to_run,
                                          media, biomass_equation)
print("Run has a biomass flux value of {} --> Growth: {}".format(value, growth))


Run has a biomass flux value of 0.0 --> Growth: False

Subsystems

The reactions connect us to subsystems (see Overbeek et al. 2014), and this test ensures that all the subsystems are complete. We add reactions required to complete the subsystem.


In [59]:
subsystem_reactions = \
    PyFBA.gapfill.suggest_reactions_from_subsystems(reactions,
                                                    reactions_to_run,
                                                    threshold=0.5)
added_reactions.append(("subsystems", subsystem_reactions))
reactions_to_run.update(subsystem_reactions)

In [60]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, reactions_to_run,
                                          media, biomass_equation)
print("Run has a biomass flux value of {} --> Growth: {}".format(value, growth))


Run has a biomass flux value of 858.871559637 --> Growth: True

In [42]:
pre_orphan=copy.copy(reactions_to_run)
pre_o_added=copy.copy(added_reactions)
print("Pre orphan has {} reactions".format(len(pre_orphan)))


Pre orphan has 2075 reactions

In [21]:
reactions_to_run=copy.copy(pre_orphan)
added_reactions = copy.copy(pre_o_added)

Orphan compounds

Orphan compounds are those compounds which are only associated with one reaction. They are either produced, or trying to be consumed. We need to add reaction(s) that complete the network of those compounds.

You can change the maximum number of reactions that a compound is in to be considered an orphan (try increasing it to 2 or 3).


In [43]:
orphan_reactions = PyFBA.gapfill.suggest_by_compound(compounds, reactions,
                                                     reactions_to_run,
                                                     max_reactions=1)
added_reactions.append(("orphans", orphan_reactions))
reactions_to_run.update(orphan_reactions)
print("Post orphan has {} reactions".format(len(reactions_to_run)))


Post orphan has 5001 reactions

In [44]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, reactions_to_run,
                                          media, biomass_equation)
print("Run has a biomass flux value of {} --> Growth: {}".format(value, growth))


Run has a biomass flux value of 1000.0 --> Growth: True

Trimming the model

Now that the model has been shown to grow on ArgonneLB media after several gap-fill iterations, we should trim down the reactions to only the required reactions necessary to observe growth.


In [45]:
reqd_additional = set()

# Begin loop through all gap-filled reactions
while added_reactions:
    ori = copy.copy(original_reactions_to_run)
    ori.update(reqd_additional)
    # Test next set of gap-filled reactions
    # Each set is based on a method described above
    how, new = added_reactions.pop()
    sys.stderr.write("Testing reactions from {}\n".format(how))
    
    # Get all the other gap-filled reactions we need to add
    for tple in added_reactions:
        ori.update(tple[1])
    
    # Use minimization function to determine the minimal
    # set of gap-filled reactions from the current method
    new_essential = PyFBA.gapfill.minimize_additional_reactions(ori, new, compounds,
                                                                reactions, media,
                                                                biomass_equation)
    sys.stderr.write("Saved {} reactions from {}\n".format(len(new_essential), how))
    for r in new_essential:
        sys.stderr.write(r + "\n")
    # Record the method used to determine
    # how the reaction was gap-filled
    for new_r in new_essential:
        reactions[new_r].is_gapfilled = True
        reactions[new_r].gapfill_method = how
    reqd_additional.update(new_essential)

# Combine old and new reactions
all_reactions = original_reactions_to_run.union(reqd_additional)


Testing reactions from orphans
At the beginning the base list has 2075  and the optional list has 2926 reactions
Saved 1 reactions from orphans
rxn05166
Testing reactions from subsystems
At the beginning the base list has 1535  and the optional list has 434 reactions
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-45-a0bee7cb6d8c> in <module>()
     18     new_essential = PyFBA.gapfill.minimize_additional_reactions(ori, new, compounds,
     19                                                                 reactions, media,
---> 20                                                                 biomass_equation)
     21     sys.stderr.write("Saved {} reactions from {}\n".format(len(new_essential), how))
     22     for r in new_essential:

/data/PyFBA/PyFBA/gapfill/reaction_minimization.pyc in minimize_additional_reactions(base_reactions, optional_reactions, compounds, reactions, media, biomass_eqn, verbose)
    199                 uneven_test = True
    200                 if len(current_rx_list) < 20:
--> 201                     left = iterate_reactions_to_run(base_reactions, optional_reactions, compounds, reactions, media, biomass_eqn, verbose)
    202                     right = []
    203                     test = False

/data/PyFBA/PyFBA/gapfill/reaction_minimization.pyc in iterate_reactions_to_run(base_reactions, optional_reactions, compounds, reactions, media, biomass_eqn, verbose)
     94         removed_reaction = optional_reactions.pop()
     95         r2r = base_reactions.union(optional_reactions).union(required_optionals)
---> 96         status, value, growth = PyFBA.fba.run_fba(compounds, reactions, r2r, media, biomass_eqn)
     97         if not growth:
     98             required_optionals.add(removed_reaction)

/data/PyFBA/PyFBA/fba/run_fba.pyc in run_fba(compounds, reactions, reactions_to_run, media, biomass_equation, uptake_secretion, verbose)
     31 
     32     cp, rc, reactions = PyFBA.fba.create_stoichiometric_matrix(reactions_to_run, reactions, compounds, media, biomass_equation,
---> 33                                                      uptake_secretion, verbose=False)
     34 
     35     rbvals = PyFBA.fba.reaction_bounds(reactions, rc, media)

/data/PyFBA/PyFBA/fba/create_stoichiometric_matrix.pyc in create_stoichiometric_matrix(reactions_to_run, reactions, compounds, media, biomass_equation, uptake_secretion, verbose)
    119 
    120     # load the data into the model
--> 121     PyFBA.lp.load(data, cp, rc)
    122 
    123     # now set the objective function.It is the biomass_equation

KeyboardInterrupt: 

In [ ]:
status, value, growth = PyFBA.fba.run_fba(compounds, reactions, all_reactions,
                                          media, biomass_equation)
print("The biomass reaction has a flux of {} --> Growth: {}".format(value, growth))